- Convolutional Neural Network (CNN)
- Encoder-Decoder
- Transformers
Main CNN idea for text:
Compute vectors for n-grams and group them afterwards
Example: “this takes too long” compute vectors for:
This takes, takes too, too long, this takes too, takes too long, this takes too long
Sequential model is used to build a linear stack of layers.import numpy as np import keras from keras.models import Sequential from keras.layers import Dense, Flatten from keras.layers import Conv2D, MaxPooling2D from keras.optimizers import SGD 
Note:
Dense is the fully connected layer;
Flatten is used after all CNN layers
and before fully connected layer;
Conv2D is the 2D convolution layer;
MaxPooling2D is the 2D max pooling layer;
SGD is stochastic gradient descent algorithm.
Most significant change: new set of weights, U - connect the hidden layer from the previous time step to the current hidden layer. - determine how the network should make use of past context in calculating the output for the current input.
Abstracting away from these choices
Widely used encoder design: stacked Bi-LSTMs - Contextualized representations for each time step: hidden states from top layers from the forward and backward passes
A transformer adopts an encoder-decoder architecture.
Transformers were developed to solve the problem of sequence transduction, or neural machine translation. That means any task that transforms an input sequence to an output sequence.
These are two valuable resources to learn more details on the architecture and implementation
https://jalammar.github.io/illustrated-transformer/ (slides come from this source)
Write with Transformer: https://transformer.huggingface.co/
Talk to Transformer: https://app.inferkit.com/demo
Transformer model for language understanding: https://www.tensorflow.org/text/tutorials/transformer